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Abstract 

By the method of Poissonization we confirm some existing results concerning 
consistent estimation of the structural distribution function in the situation of a 
large number of rare events. Inconsistency of the so called natural estimator is 
proved. The method of grouping in cells of equal size is investigated and its con- 
sistency derived. A bound on the mean squared error is derived. 
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1 Introduction and results 

The concept of a structural distribution function originates from linguistics. Let M 
denote the size of the vocabulary of an author and consider a text of this author that 
contains n words. Every choice of a word in the text from the vocabulary can be seen as 
the realization of a multinomial random vector. The whole text consists of a sequence 
of such choices X® = (x[ l) M , ...,X$ 

m)i * — 1) 2, . . . , n , which are assumed to be 
independent. So each jW i s Multinoinial(l,pi i Af,p2,Af> • • • ,Pm,m) distributed, where 
Vi,MiV2,Mi ■ ■ ■ ,Pm,m denote the cell probabilities. In linguistics the vector of those word 
probabilities is viewed as a characteristic of the author. More specifically one is interested 
in estimating the so called structural distribution function. 
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Definition 1.1 The Structural Distribution Function Fm is the empirical distribution 
function based on M times the cell probabilities. Hence 

1 M 

F M (x) = — Vpi,M<*]- 

3=1 

We will investigate the estimation problem for the case of a large number of rare 
events, i.e. we assume 

n, M — > oo and n/M — > A, where < A < oo. (1.2) 

So in the linguistic context both sizes of the text and the vocabulary are large, and the 
text size is proportional to the size of the vocabulary. Assuming that, under (|1.2j ), Fm 
converges weakly to a distribution function F we want to estimate F at a fixed posi- 
tive point x. The problem of estimation of Pi,m,P2,Mi ■ ■ ■ ^Pm,m is thus asymptotically 
replaced by estimation of F. 

The estimators we consider are based on the cell counts of the n observations of X , 

i.e. 

n 

u jM = J2^%, j = l,2,...,M. (1.3) 

i=l 

Since the cell probabilities can be estimated by the cell frequencies an obvious estimator 
of F seems to be the natural estimator Fm which is defined as the empirical distribution 
function based on M times the cell frequencies v^m I n - Hence 



^) = ^E%^<,]" (1-4) 

3=1 

The method of Poissonization is based on the following idea. Instead of considering 
the cell counts based on n observations of X, we introduce the cell counts p^M based on 
iV observations of X, where iV is a Poisson(n) distributed random variable independent 
of the X'b. So 

TV 

Pj ,m = J2 X j%^ = 1,2,...,M. (1.5) 

i=l 

The advantage of Poissonization is that the pj t M are independent Poisson(np :?i jvf) random 
variables, while (z^m, • • • , v m,m) are Multinomial(n,p ljA/ /,p 2 ,M, • • • ,Pm,m) distributed. 
The natural estimator based on p\ t Mi P2,m, ■ ■ ■ , Pm,m, denoted by F M (x), is then equal 

to 



1 M 



3=1 
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Let Zm denote a random variable with distribution function Fm and Z a random vari- 
able with distribution function F. The following theorem establishes the inconsistency 
of the natural estimator. This has already been proved by Klaassen and Mnatsakanov 
(2000) without using Poissonization. 



Theorem 1.1 Let (1.2) hold and let Fm — > F (or equivalently Zm — > Z). Then 



F M {x) 5 F Y/X (x), (1.7) 

where the conditional distribution ofY given Z = z is Poisson(Xz), for positive z, and 
of Y given Z = is degenerate at zero. 

Inconsistency of Fm also follows from the fact that it is a distribution function with 
jumps only at multiples of M/n. Hence, in the limit, it can only have mass at multiples 
of 1/A. However, knowledge of the limit is useful since based on the exact limit given by 
Theorem |1.1|, Klaassen and Mnatsakanov (2000) have constructted a consistent estimator 
of F by Laplace inversion. 

The inconsistency of the natural estimator seems to occur since n increases too slowly 
with regard to the number of cells M. We can reduce that number by replacing the M 
cells by m groups and assuming n/m — > oo. We define the grouped cell probabilities q^M 
by 

Qj,M = i = 1; 2 ' • • • ' m ( L8 ) 

and the grouped cell frequencies v^m as 

vj,M = ^2 ui < M > j = 1 ' 2 ' • • • ' m ( L9 ) 

i=kj— x+1 

where the cell limits kj, j = 0,1, ... ,m, are integers such that = k < k\ < . . . < 
k m = M. We restrict ourselves to the situation where the m groups are of equal size k, 
so M = km and kj = jk. 

Let F m denote the empirical distribution function based on m times the grouped cell 
probabilities. So 

j m 

m 3=1 

Define the estimator F m (x) based on the grouped cell counts by 



-^%^M<X]- (I'll) 



m 

3=1 
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The Poissonized version F m (x), based on the grouped Poisson counts 

kj 

i=kj-i+l 

is obtained by replacing the v's by p's in Note that pj t M has a Poisson {nq^M) 

distribution and that the p's are independent. Note also that for m = M and hence 
k = 1, a situation excluded by condition ( 1.1 3|) below, we regain the natural estimator 
F M {x). 

The following theorem establishes the weak consistency of the estimator based on 
the grouped counts. 



Theorem 1.2 Let ) hold. Assume further that 



n 



mlogm 



oo. (1.13) 



If F m — > F and the distributions induced by the F m are concentrated on a fixed bounded 
set, then 

F m (x) 5 F(x), (1.14) 

for every continuity point x of F. 

Let us sketch the proofs of the two theorems. The proofs consist of three parts. 
We have to derive the limit of the expectation of the Poissonized estimator, we have 
to show that the variance of the Poissonized estimator vanishes asymptotically, and we 
have to prove that Poissonization is allowed, i.e. that the difference between the original 
estimator and its Poissonized version asymptotically vanishes in probability. Here we 
only derive the limits of the expectation. The complete proofs are given in Section 0. 

We can rewrite the expectation of F m (x) as follows 

1 m m 

E F m (x) = E - £ J^ M < X] = - £ P p, )M < x) . (1.15) 

3=1 3=1 

Recall that for m = M this gives the expectation of the Poissonized natural estimator 
F M {x). 

Now consider a two stage procedure. We draw a value z from the sequence of 
points mq ltM , mq 2 ,M, ■ ■ ■ , m q m ,M with equal probability 1/m. The corresponding random 
variable is denoted by Z m . Note that it has distribution function F m . Given Z m = z the 
random variable Y m is equal to m/n times a Poisson (^-z) distributed random variable. 
Then we have by conditioning on Z m 

j m 

EF m {x) = - V P(-Pj, M <x)=E {P{Y m < x\Z m )) = P{Y m < x). (1.16) 
m z — ' n 

3=1 
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Hence EF m (x) equals the distribution function of Y m at x. We derive weak convergence 
of this distribution function by the continuity theorem for characteristic functions. The 
characteristic function of Y m , denoted by <p m , is given by 



<P m (t) = E(e**») = E(E(e ltY ™\Z m )) = J e™H e n -')dF m (z), (1.17) 

since the characteristic function of a Poisson(/i) distribution is equal to e M ( e In the 
case of the natural estimator we have m = M and hence by ( |1 . 2| ) 

4> m (t) - I e^^dFiz), (1.18) 



the characteristic function of the limit distribution function in (|1.7| ). For the estimator 
based on the grouped counts we have m/n — > by ( |1.13| ) and hence 

Kit) - J e Uz dF(z), (1.19) 



the characteristic function of F. By the continuity theorem 18 ) and ( 1.19Q imply the 
conclusions of the two theorems. 



Remark 1.1 In Theorem |1.2| we can replace the condition F m — > F by Fm — ► F if the 



Pi.iw's, j = 1, 2, . . . , M are ordered. A proof can be found in Section |^. 



Remark 1.2 The condition of the weak convergence of F m to F is implied by a stronger 
condition in Klaassen and Mnatsakanov (2000). Define Jm by 

M 
3=1 

Note that the structural distribution function Fm is the distribution function of fM(U), 
where U is uniformly distributed on the interval (0, 1] . Assume that ju converges 
uniformly on (0, 1] to a density function /, i.e. 

sup |/ M (i)-/(t)hO. (1.21) 

0<t<l 

Klaassen and Mnatsakanov proved, without requiring equal cell sizes, that this condition 
implies weak consistency. Moreover, the condition ( |1.13[ ) is slightly stronger then the 
corresponding one required by Klaassen and Mnatsakanov. 
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Let us consider the rate of convergence and the choice of the number of groups m. 
Define the Mean Squared Error (MSE) of F m (x) as 

MSE(F m (x)) = E (F m (x) - F(x)) 2 . (1.22) 

A standard computation shows that the mean squared error is equal to the sum of the 
squared bias and the variance. 

Consider the situation where the Pj,m's are generated by a distribution function G, 

via 

p jtM = G(j/M)-G((j-l)/M), j = l,...,M. (1.23) 

Then we also have q^u = G(j/m) — G((j — l)/m),j = 1, . . . , m. If G has a density g 
that is continuous and bounded then we have 

mq j)M = m(G(j/m) - G((j - l)/m)) = mg{i j>M )— = #(£/,m), (1-24) 

where is a point in the interval ((j — l)/m,j/m]. Assuming that g is also uniformly 
continuous on (0, 1] this implies f m (t) — ► g(t), uniformly on [0, 1). So in this situation 
the limit density / in ( p..21[ ) is equal to g. 



Let us first present some simulation results. Figures [3], @ and || show estimates of F 
based on a simulated sample where G(x) = 2x — x 2 and g(x) = 2(1 — x) for < x < 1. 
We have chosen M = 1000 and n = 3000. So A equals three. Since it equals the 
distribution function of g{U), with U uniformly distributed on [0, 1), the limit structural 
function F is given by 

{0 if x < 0, 
\x if0<x<2, (1.25) 
1 if x > 2. 

Figure [I] shows the result of the natural estimator. Next we show two figures of estimates 
based on grouping. In Figure we have k = 25 and thus m = 40 while for Figure |3| we 
have chosen k = 100 and thus m — 10. Figure [3] shows that the natural estimator is 
inconsistent, having jumps only at multiples of 1/A = 1/3. Figures [2] and |3| show that 
by grouping we achieve consistency, and that the choice of m is important. All in all 
the figures suggest that k too small or too large is not wise and that there might be an 
optimal cell size. 

The next theorem gives some insight in the choice of m. It gives bounds on the mean 
squared error of F m (x). These bounds depend on m. 
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Figure 1: F M (x) for M = 1000, n = 3000 (m = M = 1000, fc = 1) 



Theorem 1.3 Let (Tjt) hold. Assume that the cell probabilities Pj,M,j = 1, . . . , M are 
generated by a distribution function G as in ( \1 . 2$ ) and that G has a density that is 
uniformly continuous on (0, 1]. Assume further that G has a bounded second derivative 
g that bounded away from zero on (0, 1], and that, for some < a < 1/6, 



n 



m(logm) 1 / 



2o 



OO. 



1.26) 



Then we have, if ' m 3> n 



1/3 



MSE(, mW ).>^© 2/3 + i + o((^ 



,^2/3 



+ o\ - 

m 



and if m n 1 ^ 3 



MSE(F m (x))<-^ + of- 
4m V m 



1.27) 



1.28) 



The key idea of the proof is to exploit the fact that we have derived the convergence 
of EF m (x), which is in fact equal to the distribution function of Y m , to F(x) from the 
convergence of its characteristic function m , cf. ( p..!7| ), to the characteristic function 
of F. By Esseen's smoothing lemma we get a bound on the distance of distribution 
functions from the distance of their characteristic functions. By expanding ( 1.17 ) we 
obtain a rate of convergence for the bias EF m (x) — F(x) of the Poissonized estimator. 
The bound on the variance of the Poissonized estimator is the same as in the proof 
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Figure 2: F m (x) based on grouping with m = 40, k = 25, M = 1000, n = 3000 



of Theorem T72. The remainder of the proof consists of showing that Poissonization is 
allowed in this context too. 



Straightforward calculations show that the right hand side of ( 1.27|) is asymptotically 
minimized by m n if 



/ 7T 6 

V6 3 (24r) 



1/5 



n 



2/5 



This gives a mean squared error 



MSE(F mn (x)) < — 



33 /(24r) 5 



2\ 2/5 



4 V 6tt 3 



n 



-2/5 



+ o(n- 2 / 5 ) 



1.29) 



1.30) 



The bound ( |1.28| ) of Theorem |1.3| gets smaller as m increases. However, the order of m 
is bounded by n 1//3 . Hence, for m <^ n 1 ^ 3 we get 



MSE(F m (x)) > V 1 / 3 + o^n- 1 / 3 ) 



1.31) 



Note that the bound in ( |1.30| ) is smaller than the one given in ( |1.31| ). Therefore ( |1.30| ) 
gives the minimal upper bound. 

Remark 1.3 The assumption that there exists a known ordering of words in a vocabu- 
lary, necessary for grouping, for which (|1.23| ) holds is not realistic. Consistent estimators 
as the one in Klaassen and Mnatsakanov (2000), which do not require such an order- 
ing, seem to have a logarithmic rate of convergence, as opposed to the algebraic rate in 



Theorem 1.2 
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Figure 3: F m (x) based on grouping with m — 10, k — 100, M = 1000, n = 3000 

2 Proofs 



2.1 Proof of Theorem [TTT] 



The limit of EFm(x) is derived in the previous section. It remains to check ( Jl . 1 8 ) 
reformulated in the following lemma. 



Lemma 2.1 Under the conditions of Theorem 1.1 we have 



The proof is given in Section 

A bound on the variance of Fm{x) is given by 

/ x M 

Var(F M (x)) = Var — % p . M < x] 

\ j=i 

1 " 1 1 
< > - = ► 0. 



All this implies that Fm is weakly consistent for Fy/x- 

Finally we show that Poissonization is allowed. We have 



F M (x) - F M (x 



Y M I M 

i=i i=i 



[<x] 



(2.1) 



10 



< 



1 

M 



1 »r 1 U 




\N-n \ = — 




1 1 M 


n 



0. 



almost surely and in probability. This implies that Fm is weakly consistent for F Y /\ too 
as stated in the theorem. 



2.2 Proof of Theorem \T72\ 



The limit of EF m (x) is derived in the previous section. It remains to check fll.19 ) 
reformulated in the following lemma. 



Lemma 2.2 Under the conditions of Theorem we have 
<f>m(t) 



j e^<^- x )dF n {z) - J e itz dF{z). (2.2) 



The proof can be found in Section §. 

Here we bound the variance of F m ( follows 



^ m j m 

VarF m (x) = Var - ^ J [f p . >M < x] = — ^ Var I^ Pj >M < x] (2.3) 

3=1 j=l 
1 v^l 1 
rn z < J A Arn 



m 2 ^— ' 4 4m 

3=1 

This implies that -F m (x) is a weakly consistent for F(x). 

In order to transfer the weak consistency result to the original estimator we must 
show that we may indeed Poissonize, i.e. we must show that F m (x) — F m (x) vanishes in 
probability. 

We need the Bernstein inequality for Poisson random variables. If X has a Poisson 
distribution then 

f( L |^ 1 >^<2exp(- 2 + J xyl/2 ), (2-4) 

cf. Lemma 8.3.4 in Reiss (1993). It also follows from Inequality 1 on page 485 of Shorack 
and Wellner (1986). 

Write Zj tn = mq^u- Note that, since the distributions induced by the F m are con- 
centrated on a bounded set, we have maxKj< m Zj >n < c for some constant c > 0. Hence, 
for all 6 > 0, we have 



m 

E(\ m 
P[\-Pj,M-^ 



3=1 



> 5 
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m 

EP[\pj,M Zj,n\ > — 5 
\ m m 



3=1 



J2 P 



Pj,M-nq jtM \ > / n \ V 2 1 



< 



(ng iiM ) 1 /2 

ySexpf-^i l_ 

^ V m 2 + <5— 

/..oral 

< 2m exp I — o 

V m 2c H 

= 2 exp ( log m 



m 



3i n 



(2.5) 



+ 5 



2j,M 



5 2 



m log m 2c + <5 



+ 1 



by CH). 

By max.x<j< m qj,M -> we have Var (u j>M ) = nq jtM (l - q jtM ) ~ ftg ijM = Var (p jiA f). 
By the Bernstein inequality for binomial random variables, cf. Shorack and Wellner 
(1986), p 440, it now follows that for 5 > 



m 



n 



j.n 



> 5 



0. 



This implies that with probability approaching one we have 

m 



m _ 

Vj M ~ z j,n 

ft 



< 5 and 



ft 



Pj,M Zj.n 



J = 1, 



m. 



(2.6) 



(2.7) 



Consequently, (|2.7|) implies 



~ ^2( l lzj,n<x-S] ~ I[*j,n<*+S\) < ^m(x) - F m (s) < — 2j(I[«j, n <a!+fl ~ kz j>n <x-S\) • (2.8) 

/ / L lit 

3=1 J=l 

By the weak convergence of _F m to F, if x — 5 and x + 5 are continuity points of F, 
the left and right hand side converge to F(x — 5) — F(x + 5) and F(x + 5) — F(x — 5) 
respectively. Now, for given e > 0, choose 5 such that F(x + 5) — F(x — 5) is smaller 
than e and we have shown 

P(\F m (x)-F m (x)\>e)^0. (2.9) 
Hence F m (x) — F m (x) vanishes in probability, proving that Poissonization is allowed. 



2.3 Proof of Theorem [TT31 

First we consider the mean squared error of the Poissonized estimator. By a standard 
calculation we have 



MSE(F m (x)) = (EF m (x) - F{x)f + Var (F m (x)) 



(2.10) 
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A bound on the variance is already given by ( [2.31 ). It is harder to obtain a bound 
on the bias. We shall use the convergence of the characteristic function of Y rn to the 



characteristic function of Y in the proof of Theorem |1.1| and Esseen's smoothing lemma, 
see Feller (1966), Section XIV 3, Lemma 2 on page 538. 

Lemma 2.3 (Esseen's smoothing lemma) Let F be a probability distribution func- 
tion with vanishing expectation and characteristic function (p. Suppose F — G vanishes 
at ±oo and that G has a derivative g such that \g\ < r. Finally, suppose that g has a 
continuously differentiate Fourier transform 7 such that 7(0) = 1 and 7'(0) = 0. Then, 
for all x and T > 



\F(x)-G(x)\ < - [ 

7T J_ : 



-f 






t 



dt + 



24r 
~kT' 



(2.11) 



Now apply this lemma with F equal to the distribution function of Y m and G equal to 
the limit structural distribution function F. Note that both distribution functions have 
expectation one and that the induced distributions are concentrated on [0, 00). Then 



\VF m (x) 



F(x)\ < - 

vr J_rp 



E e itYm - E e ltz , 24r 



Let us first consider the integrand. Write 



lEe 



< 



Ee 



itZ\ 



t 



(2.12) 



dF m (z) - / e ltz dF(z 



e^^-^dF^z)- f e Uz dF m (z)\ 



For n large we have 



n. z ( e «^_l) 



e m 



where R n (t) = e*'v - 1 - it* + \t 2 ^. Note that 

. . . I, m 3 

\Rn(t)\<-t 3 — 

6 n A 

and that for weC, and |iw| small enough, we have 

\e w - 1| < A\w\. 

Hence 



e™ 



zRn(t) 



, m 



(2.13) 
(2.14) 

(2.15) 

(2.16) 

(2.17) 
(2.18) 
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So we can bound the term fl2.13| ) as follows 



] dF m (z) 



< 



itz- 



'-t z z 



e ltz dF m {z) 



— e 



itz- 



n z z 



dF m (z) 



(2.19) 



< 



e 2 n l z 



dF m (z) + 



1 m o 
< 1 2 



zdF m (z) + 



2 m 2 

3 n 2 



e %zR n (t) _ j 



ZdFrr,(z) 



dF m {z) 



2n 

lm , 2 m 2 . ,, 
2 n 3 n z 

For ( |2.18| ) to hold we have tacitly assumed that {n/m)zR n (t) vanishes for — T < t > T. 
By (|2.16| ) and the fact that Z m is almost surely bounded by the same constant for all m, 
it suffices to check that (m 2 /n 2 )t 3 —>■ for — T < t > T. Further on in the proof T will 
depend on n. The condition is satisfied for our two choices of T n in ( |2.27|) and (|2.30j) . 
For the first term in (|2.12|) we get 



i r T i 



7T 



— T 



•-z e"-5T-l 



dF m (z) 



e Uz dF m (z) 



dt 



1 m 



2 m 2 



1 m m9 4 m 2 o 

= T + -T 3 . 

27r n 97r 



t 2 rft 



(2.20) 



Let the function f m be defined by 

m 

/m(0 =Vm?MH <K i|, < t < 1. (2.21) 

* * L m — m J 

Then F m is the distribution function of f m (U) where U is uniformly distributed on (0, 1]. 
Since f m converges uniformly to g the limit distribution function F is the distribution 
function of g(U). Hence 

e itz dF m {z) - j e itz dF{z) = J ( e itfm{u) - e it9iu) )du. (2.22) 

Integrated over the intervals ((J — l)/m,j/m], the constant mq^M yields the same value 
as g integrated over these intervals. So we can write 

e itf m {u) _ e itg(u) _ e itf m (u) ^ _ git(fl(u)-/ m (u))^ 

= (it(f m (u) - g(u)) + R n (t,u)) , 
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where 



\R n (t,u)\<h 2 (g(u)-f m (u)) 2 



(2.23) 



And hence, if g has a bounded derivative on (0, 1], 



(e itfm(u) - e lt9(u) ) du = e ltfm{u) R n {t,u)du 



< \t 2 j{Uu)-g{u)fdu< C -^- 2l 



where c is a positive constant. This implies 

e itz dF m (z) - [ e ltz dF{z) 



i r T i 



7T 



— T 



t 



, 1 flci 2 , c T 2 

dt<- tj ^dt = ~ . 

7r J_ T |i| 2 m^ 27r m 2 



(2.24) 



Hence, for all x and T > 



. . . , . . 4 m 2 , 1 m o c 1 m9 24r 

EF m (x - F x < -T + T + -T + . 



First assume that m ^> n 1 / 3 . Then equation (|2.25|) becomes asymptotically 



\EF m (x)-F(x)\<±™T 2 + ^. 

2ir n hi 



The value T n that minimizes the right hand side of (|2.26 ) is given by 
Hence the bias can be asymptotically bounded by 



(2.25) 



(2.26) 



(2.27) 



(2.28) 



and the mean squared error by 



MS E (^),,>^ ( ^- + _L + o((S - )+o( I 



If m <C n 1//3 , by minimizing the third and fourth term in 



T n = c- 1/3 (24r) i / 3 m 



l/3 m 2/3 



that asymptotically 



MSE(F m (x)) < A c 2/3 (24r) 4/3 m -4/3 + 1 +o(m -4/3 )+G /I 

47T 2 4m \m 



(2.29) 

we get, by choosing 
(2.30) 



— + o(— ). (2.31) 
4m Vm/ 
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We have now derived the asymptotic bounds on the mean squared error of the Pois- 
sonized estimator. We will show that Poissonization is allowed. By the triangle inequal- 
ity we have 

MSE(F m (x)) 1/2 < MSE(F m (x)) 1/2 + (E (F m (x) - F m (x)) 2 ) 1/2 . (2.32) 

The second term on the right hand side can be dealt with using the following lemma. 
Its proof is given in Section [| 

Lemma 2.4 Under the conditions of Theorem and we have for any < a < | 

E (F m (x) - F m (x)) 2 = O ((^) 1 " 2 °) +0(±). (2.33) 

By this order bound and ( f2~29l) and ( CT ) it follows that (E(F m (x) - F m (x)f) l l 2 is 
asymptotically negligible compared to MSE(F m (x)) 1 / 2 . Hence Poissonization is allowed. 



3 Technical proofs 



3.1 Proof of Lemma 12.1 



Recall that (ti/M)Ym, given Zm = z, has a Poisson(-^2:) distribution. We have Z M —> Z, 
so F M {w) — > F(w) at all continuity points w of F. Let ipM denote the characteristic 
function of (u/M)Ym- Then 



V, M (t) = E ( e *W^) = E (E (^^^IZm)) 
Consider t fixed. For z e [0, to] we have 



dF M {z). (3.1) 



1 _ e (\-fj)z(e^-l)\ ^ M , ( ):(("'-[') 



or equivalently 



,&*(e«-l) _^ A*(e«-l) 



< 1 - eV A_ M- 



This also holds for z replaced by z n , for every sequence {z n } with values in [0, w], showing 
that the convergence is uniform in z. Hence for e > and n large enough 



£z(e*-l) _ e Az(e«-l)| < f/2 



(3.2) 



for all z G [0, w]. 
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Let w be a continuity point of F. Note that, because F and Fm vanish on the 
negative half line, the point -1 is also a continuity point. Then, according to the Helly- 
Bray theorem and because characteristic functions are continuous, we can conclude that 



M^dFMiz) -> / e^^dFiz). 



So for n large enough 



H^)dF M (z) 



Xz e rt -l 



dF(z) < e/2. 



Because of (B.2I) and (13. 4) we now have 



(3.3) 



(3.4) 



i 

eM^) d F M (z) 



Xz(e lt -1 



< 



-1 



dF{z 
dF M (z 

l )dF(z) 



-1 



< e. 



Next choose the continuity point w such that 1 — F(w) < e/2. Since F M (w) 
we also have < 1 — Fm{w) < e/2, for n large enough. This implies 



Az e lt -l 



< 



2 e lt -l 



dF(z) 

\dF M {z)+ I \e x < e%t - l )\dF(z) 



1-Fm(w) + 1-F(w) <e. 



The inequalities ( |3.5| ) and ( |3.6| ) show that 
for n large enough. Hence 



e A2 ( e!t " 1 ) C /F(^) 



< 2e 



F 2 e"-l 



dF M (z 



> e"-l 



dF(z) 



(3.5) 



(3.6) 



(3.7) 



(3.8) 



Since convergence of characteristic functions is uniform on bounded intervals we also 
have (O). 
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3.2 Proof of Lemma |2T2| 

The proof is similar to the proof in the previous section. Note that 

lim —z (e u ^ - 1) = itz, (3.9) 

n^oo 777, 

uniformly for ze[— l,w]. Let e > and w be a continuity point of F. By the Helly-Bray 



theorem and ( |3.9|) , we have, for n large enough, 



i 2 e"n--l 



dF m (z) 



e Uz dF(z) 



< 



+ 



-z e"n"-l 



e ifa dF m (z) 



dF m (z) 



J tz dF m (z) 



/w 
e itz dF(z) 



< e. 



Now choose u> such that 1 — F(w) < e/2. Then, for rz large enough, 



>dF m (z) 



e ltz dF(z) 



< 



■z e"»-l 



|dF m (z) + / \e ttz \dF(z 



(3.10) 



(3.11) 



< 1 - F m (w) + 1 - F(w) < e. 
As in the previous section the inequalities (|3.11|) and ( p.lOj ) prove the lemma. 



3.3 Proof of Remark |TT] 

We assume that the set of the p^m's is ordered. So pi t M < P2,m < • • • < Pm,m- Let x be 
a continuity point of F. We want to show that 



F M (x) - F m (x) 



1 M 



[Afpj,M<a:] 



^ m 

4 = 1 



r <x]| 



(3.12) 



vanishes since this implies that F m (x) — > F(x) follows from Fm{x) — > F(x). 

Assume that in the j3 first groups of the m, (5 = 0, . . . , m, we have Mpj t M < £ and 
that in the (/3+l)th group for the first a, a = 1, . . . , k, of the p^m's we have Mpj M < x 
and that for the others Mp^M > x. Then in total exactly + a of the p^m's satisfy 
Mpj t M < Note that both /3 and a depend on M and x. 
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Let us focus on the i-ih group, where i = 0, . . . , (3. Then we have Mpj }M < x for all 
j = ki_i + 1, . . . , /cj_x + k = ki and hence for all i — 1, . . . , /3 



M 2J Pj,M < kx. 

j=fc,:_l + l 

This implies mg^,/ < x. 

We can now bound the difference ( |3.12|) . We get 



m 

j=i i=\ 



f <x\ 



kp + a (3 1 



M mm 



a c 
~M~ m 



since a/M < k/M -> 0. 



(3.13) 



3.4 Proof of Lemma 11 A 



Let 5 n = (m/n) 1 ^ 2 a and let A n denote the event 

m 



m _ 
n 



< 5„ and 



n 



Pj,M ~ Zj. 



<5 n , j = l, ...,m. 



Then, as in (|2.5|) we have, for n large enough 



n 



m 



n 



Vj,M z j,n 



n 



< 4mexp(-^ m2c + J 
n \ 2a 1 

— logm 



>S„) P 

V) 



?? 



>5n 



= 4exp 

< 4 exp f — log m f 



m/ c 

1 n 2a 



c m 2a log m 



< 



Using ( |2.8| ) we write 

E(F m (x) -F m (x)) 2 
= E (F m (x) - F m (x)) 2 I An + E (F m (x) - F m (x)) 2 I A c 



< 



- m 



]) +P(A c n 



(F m (x + 5 n ) - F m (x - 5 n )f + P(A c n ). 



(3.14) 
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Now recall that F m is the empirical distribution function based on the values mq^ M ,j = 
1, . . . ,m. If g'(x) > then each of these values are order 1/m apart. Hence there are 
order 5 n /(l/m) = mS n values in the interval (x — S n ,x + 5 n ], each contributing 1/m to 
the probability. So 

F m {x + 5 n ) - F m {x - 5 n ) = 0{5 n ). (3.15) 

Hence 

E (F m (x) - F m (x)) 2 = 0{8 2 n ) + 0{\), (3.16) 
which completes the proof of the lemma. 
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